Statistical Parsi Statistics
نویسنده
چکیده
s:rose We describe a parsing system based upon a language model for English that is, in turn, based upon assigning probabilities to possible parses for a sentence. This model is used in a parsing system by finding the parse for the sentence with the highest probability. This system outperforms previous schemes. As this is the third in a series of parsers by different authors that are similar enough to invite detailed comparisons but different enough to give rise to different levels of performance, we also report on some experiments designed to identify what aspects of these systems best explain their relative performance. np:profits vp:rose fpunc:. I adj:corporate n:profits v:rose I I I Corporate profits rose . Figure 1: Parse of a simple sentence Introduction work, in which we attempt to trace performance differences to particular decisions made in the construction of these parsing systems. We present a statistical parser that induces its grammar and probabilities from a hand-parsed corpus (a tree-&&). Parsers induced from corpora are of interest both as simply exercises in machine learning and also because they are often the best parsers obtainable by any method. That is, if one desires a parser that produces trees in the tree-bank style and that assigns some parse to all sentences thrown at it, then parsers induced from tree-bank data are currently the best. The Probabilistic ode1 The system we present here is probabilistic in that it returns the parse x of a sentence s that maximizes p(~ ] s). More formally, we want our parser to return (s) where P(s) = arg rn;xF = argm;xp(r, s) (I) Naturally there are also drawbacks. Creating the requisite training corpus, or tree-bank, is a Herculean task, so there are not many to choose from. (In this paper we use the Penn Wall Street Journal Treebank [6].) Thus the variety of parse types generated by such systems is limited. Thus the parser operates by assigning probabilities p(n, s) to the sentence s under all its possible parses x (or at least all the parses it constructs) and then choosing the parse for which p(n, s) is highest. At the same time, the dearth of training corpora has at least one positive effect. Several systems now exist to induce parsers from this data and it is possible to make detailed comparisons of these systems, secure in the knowledge that all of them were designed to start from the same data and accomplish the same task. Thus an unusually large portion of this paper is devoted to the comparison of our parser to previous *This research was supported in part by NSF grant IRI-9319516 and by ONR grant N0014-96-1-0549. Copyright 01997, American Association for Artificial InteIligence (www.aaai.org). All rights reserved. To illustrate how our model assigns a probability to a sentence under a given parse, consider the sentence “Corporate profits rose.” under the parse shown in Figure 1. We can think of a parse as a bag of context-free grammar rules specifying how each parse constituent is expanded. Indeed, this is exactly how our system considers it, since it uses a context-free grammar and finds a set of (we hope) high-probability parses for the sentence. In what follows we pretend that the probability model is applied separately to each possible parse. In actuality this is too inefficient; once the set of parses has been found their probabilities are determined in one bottom-up pass. Returning to Figure 1, at each non-terminal node we note the type of node (e.g., a noun-phrase, np) and 598 NATURAL LANGUAGE From: AAAI-97 Proceedings. Copyright © 1997, AAAI (www.aaai.org). All rights reserved.
منابع مشابه
A dynamic, mitotic-like mechanism for bacterial chromosome segregation.
The mechanisms that mediate chromosome segregation in bacteria are poorly understood. Despite evidence of dynamic movement of chromosome regions, to date, mitotic-like mechanisms that act on the bacterial chromosome have not been demonstrated. Here we provide evidence that the Vibrio cholerae ParAI and ParBI proteins are components of an apparatus that pulls the origin region of the large V. ch...
متن کاملExpected values of the number of failures for two populations under joint Type-II progressive censoring
The model in which joint Type-II progressive censoring is implemented on two samples from different populations in a combined manner is considered and the probabilities of failures are discussed. This model may have relevance for practical applications, when an experimenter need to know the expected values of the number of failures for each population. This knowledge plays an important role in ...
متن کاملInterventional EUS (with videos).
John T. Maple, DO, FASGE, Chair and primary author, Rahul Pannala, MD, MPH, Barham K. Abu Dayyeh, MD, MPH, Harry R. Aslanian, MD, FASGE, Brintha K. Enestvedt, MD, MBA, Adam Goodman, MD, Sri Komanduri, MD, Michael Manfredi, MD, Udayakumar Navaneethan, MD, Mansour A. Parsi, MD, FASGE, Zachary L. Smith, DO, Nirav Thosani, MD, Shelby A. Sullivan, MD, Subhas Banerjee, MD, FASGE, previous Committee C...
متن کامل